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Article Info ABSTRACT 
Article history: Data analysis, data management, and big data play a major role in both social 
: and business perspective, in the last decade. Nowadays, the graph database is 
Received Nov 18, 2017 the hottest and trending research topic. A graph database is preferred to deal 
Revised Jan 1, 2018 with the dynamic and complex relationships in connected data and offer 
Accepted Dec 11, 2018 better results. Every data element is represented as a node. For example, in 
social media site, a person is represented as a node, and its properties name, 
Keyword: age, likes, and dislikes, etc and the nodes are connected with the relationships 
via edges. Use of graph database is expected to be beneficial in business, and 
Big data social networking sites that generate huge unstructured data as that Big Data 
Dynamic schema requires proper and efficient computational techniques to handle with. This 
Graph database paper reviews the existing graph data computational techniques and the 
research work, to offer the future research line up in graph database 
management. 
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1. INTRODUCTION 

Today, the user data is increasing rapidly due to many data generating processes like latest social 
media networks, rapid adaptation of smartphones and handheld devices further enhances the data creation. 
The computation of this data is becoming more difficult day by day, as the users of the digital data and 
networks are increasing by manifolds [1]. Traditional databases cannot compute this huge data without 
complexity for the real-time responses, whereas, in the case of graph databases, a graph is generated for each 
entity, which speeds up the process. The use case for a graph database scenario is content-based data 
filtering. Graph database provides better performance and data consistency; hence many researchers are 
considering the graph models [2]. 

In order to handle the issues of storing huge data, many of the researchers have presented the 
concept of graph and graph storage, in which the graphs are implied to model the huge data with complicated 
design. In every graph, there will be nodes, properties, and edges as the relationship among them. The 
connected data graph database also offers the significant choice to deal with the structured, semi-structure 
and unstructured data [3]. The graph database offers the fastest response to a query, many times, in 
milliseconds. Today, the graph databases are widely used in retail, social network, healthcare, 
communication and other online solutions. Operations like create, update, read and delete are available in 
graph database system. The drawback of these systems is that it is more expensive by nature than the 
traditional methods [4]. 
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This survey paper discusses the concepts of graph databases, review of the existing research 
regarding existing computational techniques of data management. Section 2 discusses some basic conceptual 
aspects of graph databases, modeling, computational techniques and comparisons of techniques. Section 3 
provides the literature review of recent research work in graph database management, Graph database 
computational techniques. Section 4 provides the research gap in recent research work of Graph Databases. 
Section 5 describes the future research lineup, and finally, Section 6 concludes the paper. 


2. GARAPH DATA 

In recent years the way Internet and mobile communication have been used for different and varied 
needs and applications by a common user, academicians, researchers have been started rethinking as for how 
to store the huge data which is being generated every day, every hour and every minute. This need for the 
storage and retrieval of data and information brought back the concepts of graph and graph models [4], [5]. 

Graphs are used to model complicated structures. The graph is a collection of nodes, edges, and the 
relationships between them. In the graph, nodes are called entities, and there are many ways in which these 
entities are co-related in a different type of applications. The connection between these entities is called as a 
relationship. In graphs, data term “Attributes” related to entities and relationships are called labels. In a graph 
like structure, data is stored into nodes, and these nodes have some properties. In graphs, relationships consist 
of properties and connect one node to the other node. 

The example shown in Figure 1, demonstrates the relationship between the two animals. In the 
above figure, two things are identified that Thing-1 and Thing-2, exhibit properties like animal type, name 
(cat & dog) and relationship. The representation says that the Thing-1 and Thing-2 are dog and cat 
respectively and are named as cute and handsome respectively. Finally, both dog and cat relationship is 
mentioned due to its animal category. 


Figure 1. Example of Graph data Figure 2. Units of graph database system 


2.1. Graph Databases 

The Graph database system has four different units such as creation, reading, updatation and 
removing, which can be used in designing of graph data model. The Index free square matrix of finite graph 
representation is more necessary to get the high-performance graph traversal. Graph database utilizes the 
square matrix or adjacency then each node manages the direct relationship with the adjacent nodes. The graph 
database exhibit a single data structure known as a graph, and it has no combined operation, and hence each 
edge will be connected to another edge. The graph will store the data in nodes having a relationship. The data 
in the graph database will follow the model graph property. Graph-oriented database is specialized No SQL, 
where relationships among the nodes are stored and managed generically. Built-in support for relations makes 
the traversal much faster for multidimensional, interconnected datasets; hence suitable for online transaction 
processing (OLTP). For the same reason, organic and ground-up products like neo4j offer multifold 
performance benefits in comparison to multilayer abstractions over the traditional technologies like relational 
databases (RDB), and object-oriented databases (OODB). It also simplifies the complexity of design and 
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implementation; popular notation is “if you can see whiteboard, you can graph.” Being a high-level abstraction 
to the network model database, it reduced the coding effort to one-tenth; it’s a key technology used in rapid 
application development (RAD) [5], [6]. 

Graph databases are quickly making inroads into real life from research laboratories; many social 
networking enterprises like Twitter, Facebook, and Google have already adopted years ago. Recently the 
technology- not only the scientific data but also the web and many different kinds of data can be modeled as a 
graph. This helps to overcome the limitations of RDBMS, like predefined schema and to process complex 
queries in milliseconds. Especially, lack of schema allows developers to gain high productivity, besides 
providing the capability to process complex multilevel queries in real-time. E-commerce sites and users 
benefit from the easy processing of the recommended product. Machine learning algorithms are utilized the 
most, for the applications such as these where big data analytics is used by global top 100 companies. Bug 
Localization is another application area worth mentioning for the use of graph databases. Overall there is a 
variety of domains where graph data modeling can be applied to revolutionize the user experience [7]. 


2.2. Existing types of Graph Database Models 

In recent past, many tools are developed using the graph database concept, for example, Neo4J and 
Sparksee [8]. The tools like Oracle spatial and graph, OQGraph, and ArangoDB are designed as an abstraction 
with the underlying architecture of relational databases MySql, Oracle [9]. Until now, there is no industry 
standard [10], and moreover many of them are designed to be suited for a particular domain [11]. In the case of 
In-memory model, scalability is limited as the memory holds the content [12]. Another reason for inefficiency 
into the model is due to horizontal scaling and layering mechanism. The requirement of the new paradigm is 
for handling an extensive data; very few models are designed to adopt parallelism as well OLTP. The early 
models lack standard query language, Application programming interface, and protocols as found in 
conventional models such as SQL, JDBC, and REST. Lately, Gremlin and SPARQL are gaining consensus, 
but the adaptation is too slow. 


2.3. Neo4j (Neo Technology) 

Neo4j is a disk-based transactional graph database and named as “World leading graph database.” Its 
first release date was in 2007. Neo4j also supports another language like Python except for Java for graph 
operations. Neo4j is an open source project [7] available in a GPLv3 Community edition, with Advanced and 
Enterprise editions available under both the AGPLv3 as well as a commercial license. Neo4j is best graph 
database for enterprise deployment. It scales to billions of nodes and relationships in a network. Neo4j 
manages all the operations that modify data in a transaction. In Neo4j both nodes and relationship can contain 
properties. Neo4j is a graph database that manages graphs and is optimized for graph structure instead of 
tables. It is the more expressive type of graph database is similar to other graph databases. Neo4j is most 
popular graph databases today [8]. 


2.4. Hyper Graph DB 

It is an open-source database supports hype graphs. Hyper graph [8] is different from the normal 
graph because in this edge is points to the other edges. In various fields, it is used in the modeling of the graph 
data. It supports online querying with an API written in Java. It is based on the Hyper Graph DB model. It is a 
universal data model highly complex and large-scale knowledge application. It has graph-oriented storage and 
customizable indexing. In this graph database, a hyper edge is easy to convert into a tuple. It is a distributed 
and graph-oriented database [8-14]. 


2.5. DEX 

DEX [15] is said to be very efficient and bitmaps-based graph database and is written in C++ 
language. It was first released in 2008. It makes graph querying possible in different networks like social 
network analysis and pattern recognition. It is also known as high-performance graph database in the case of 
large graphs and useful for most of the NoSQL applications. The latest version of DEX supports both Java 
and.NET programming. It’s portable and requires only a single JAR file for execution. DEX is called the 
fourth most popular graph database today [3], [6]. 


2.6. Trinity 

Trinity is a distributed graph system [9] over a memory cloud. Memory Cloud is globally addressable 
in memory key-value store over a cluster of the machine. It provides fast data access power when we have 
large datasets. It is a large graph processing machine. It provides fast graph exploration and parallel computing 
for larger datasets. It also provides high throughput on large graphs which have a billion nodes. 
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2.7. Infinite Graph (Objectivity) 

Infinite Graph is produced by an organization called Objectivity. It is a type of company that works to 
develops database technologies supporting large-scale, object persistence and relationship analytics. An 
infinite graph database is a distributed graph database in Java, and it is based on a graph like structure. We can 
call infinite graph as a cloud-enabled graph database. It is designed for to handle the very high throughput. It is 
a single graph database distributed across multiple machines. There is a lock server which handles lock 
requests from database applications. It is capable of dealing with complex relationship requiring multiple hops. 
It provides graph-wise indexes on multiple key fields and also provides high performance regarding 
query [7], [10]. 


2.8. Titan 

Titan [9] was adopted in 2012. It is written in Java and an open source project. The main benefit of 
using Titan is its scaling feature. It also provides support to very large graphs and scales with the number of 
machines in a cluster. It is also highly scalable graph database regarding concurrent users and size of the 
graph. It provides a batch graph processing with Hadoop framework and also gives answers to complex 
queries in milliseconds. It consists of three main components: 

a. Native Blueprints Implementation 

b. Gremlin Query language 

c. Rexster Server 
It follows property graph model and supports Gremlin: a graph traversal query language. It also offers an 
optimized disk representation for efficient use of storage and speed of accessing data. Applications can interact 
with Titan in mainly two ways: 

a. First Method is that calls Java-language API’s related to Titan which includes its native API 

implementation. 

b. TinkerPop stack utilities such as Gremlin query language built atop Blueprints. 
Recent research survey 
The research in the domain of graph data is classified into ten different categories by considering IEEE 
Xplore journals. The categorization is given as below. 


2.9. Data Store Efficiency 

In order to bring the better efficiency of the data storage, some of the issues may exist among these 
data compression is necessary to store more data. Also, the data standardization may play a greater role to map 
the data and translate for cloud storage. In the large, the super graph search is required to choose the data 
graphs features. The recent work done in this category Table 1 below. 


Table 1. Work for data store efficiency 


Author Issue considered Method adopted Result 
Sutrisna et al. [4] Data compression Graph clustering algorithm with data Lossless compressed graph data is 
set of collaboration data among journal achieved 
writers 
Bansal et al. [5] Data standardization and online compression algorithm Reduction in number of nodes and 
classification achieved graph database 
compression for less storage space 
Lyu et al. [6] Super graph search indexing and query processing Indexing and processing time 
algorithms , CCD dataset and NCI 
dataset 
Chen and Chen Supporting reach ability | Decomposition of graphs High efficiency, effectiveness and 
[7] queries querying time 


2.10. Database Indexing Method 

In order to facilitate isomorphism and similarity queries and building efficient graph database systems 
and accelerate graph similarity search, much significant works is performed. The following Table 2 is the 
some of the chosen work in database indexing method. 
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Author Issue considered Method adopted Result 
Williams et al. [8] Graph storage, similarity Sub-graph isomorphism and similarity queries Achieved improved query 
and indexing mechanism by using Protein motifs and synthetic times for sub-graph 


Luo et al. [9] 


Yuan et al. [10] 
Bei et al. [11] 


Goldberg et al. 
[12] 


datasets 


Large scale graph 
database indexing and 
search approach 

Graph feature mining 


Graph search 


Problem of 
identification 


fragment 


T-mixture model (Combination of optimized vector 
quantizer and probabilistic approximate ate based 
indexing scheme) 

Query grouping mechanism 


Distributed graph searching mechanism 


Heuristic mechanism 


isomorphism queries. 
Robust in handling outliers. 


Achieved better , faster and 
light weight filtering, 
Achieved distributed graph 
database 

Achieves optimized running 
time 


2.11. Graph Indexing Method 
The improvisation of graph data modeling was done for different data. The following Table 3 briefs 
the work performed for graph data modeling and graph based management system. 


Table 3. Work for graph indexing method 


Author Issue considered Method adopted Result 
Dongoran et al. [13] Data modeling Index construction, database Achieved more path length, more 
filtering, sub-graph matching indexing time 
Kang et al. [14] Dynamic graphs storage and Graph based database Robust in handling outliers. 
manage management system 


2.12. Sub-graph matching method 

This part briefs some research ideas presented by many researchers in data querying, sub-graph 
matching, etc. The recent works for better sub-graph matching are presented. Also, the works shown in the 
following Table 4 gives the ideas about various graph data techniques. 


Table 4. Work for sub-graph matching method 


Author 


Issue considered 


Method adopted 


Result 


Giugno Shasha [15] 


Brocheler et al. [16] 


Brécheler et al. [17] 
Brécheler et al. [18] 


Hong et al. [19] 


Hoksza and Jelinek 
[20] 


Graph querying 


Sub-graph Matching 


Approximate 
Matching 
Sub-graph Matching; long- 
tailed degree distributions 
Set similarity 


sub-graph 


protein-protein interface 


(PPI) identification 


Regular expression graph query language 
that combines Xpath and Smart; hash- 
based finger-printing 


probabilistic method to estimate 
probabilities; Partition algorithm for 
creating index 

PMATCH algorithm 


delicious social book-marking service 


Set similarity pruning and structure- 
based pruning; dominating-set-based 
sub-graph matching; inverted pattern 
lattice and structural signature buckets 
are designed 

knowledge-based approach Using Neo4j 
for mining protein graphs 


performs well for small query 
graphs on large graph databases 
(in the thousands 

Works efficiently, answering 
778M edge real-world SN in 
under one second. 

Efficient and scales to over a 
billion edges. 

Faster than static cost models for 
warm caches. 

outperforms state-of-the-art 
methods by an order of magnitude 


in comparison to Microsoft SQL 
Server, Neo4j is a viable option 
for small, sub-graph query types 


2.13. Semantic 


The recent works that are addressed the semantic approach, Query semantic data processing; data 


analysis in a graph database is given Table 5. 
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Table 5. Work for graph database semantic 


Kivikangas and 
Ishizuka [21] 
Kalmegh and 
Navathe [22] 
Trawan and 


Prihatmanto [23] 


Graves et al. [24] 


Cesare et al. [25] 


Morari et al. [26] 
Wardani and Kiing 
[27] 


Souza et al. [28] 


Hayakawa and 


Nishiyama [29] 
Bednar et al. [30] 


Hartley et al. [31] 


Cavoto et al. [32] 


Caldarola et al. [33] 


Lamhaddab 
Elbaamrani [34] 
Leida and. Chu [35] 


and 


Mordinyi et al. [36] 


Balboni et al. [37] 
Wu and Chen [38] 


John et al. [39] 


Xu and Luo [40] 


Figueira and Libkin 
[41] 


Semantic Queries 


high-performance 
databases 


graph 


Implementation of Graph 
Database for OpenCog 


Design of data store for 


genome 
Automated taxonomy 
extraction from semantic 


process models 


Scaling Semantic Graph 
Databases 
Semantic Mapping 


Relational to Graph Model 


Graph Databases with 
Semantic Network Models 


for Software-Defined 
Networking (SDN) 
Applications 
Query Processing of 
Semantic Data 
theoretical analysis and 


performance testing 


storing, 

Accessing and analyzing 
Trillion vertices and edges. 
Network-Driven Data 
Analysis 

Big Graph-based 
Visualization 


Data 


Graph Modeling 

For Mobiles 

Distributed SPARQL query 
answering over RDF 

data streams 

efficient data store that is 
capable of versioning and 
querying local and common 
concepts 

Evolution Analysis 


Frequent Sub-graph Mining 


Learning process 
enhancement against 
population 

Expression-Driven Sketch 


Graph Matching for Face 
Recognition 


Querying Graphs 


Utilizing Universal Words (UWs). 
Concept Description Language (CDL) 
for semantic data instead of RDF, and 
Neo4j 
Survey 


GraphBacking Store API extends 


Backing Store C++ API 


Review all available options and 
compare 
linguistic approach based on semantic 


similarity 


SPARQLTO-C++ COMPILER, A 
Library Of Distributed Data system, 
And A Custom Multithreaded Runtime 
create property relationship in the result 
of the mapping and converting process 


Import the Network Markup Language 
(NML) model 


Summarized graph in advance 


Benchmark data and 


population 


modeling 


semantic graphs execution of parallel 
out-of-core graph algorithms 


FishBase global information system 


The WordNet 


implementation of 

an extractor module (in java language) 
Business Process 

Monitoring domain for Query workload 
balancing 

NoSQL graph database 


natural language processing engines to 
build temporal graph database 

By 

normalizing the incidence matrix 
Natural 

Language Processing 


multi-layer grammatical face model 


Parikh automata 


Improved Semantic Queries 


Define, 

Characteristics, 

Future directions 

Atom Space represents knowledge 
in a hyper graph structure, persisting 
in a graph database is more intuitive 
and more portable 


Graph database 
generalize elements of business 
processes that are automatically 


discovered from semi-structured 
data projects 


Better scaling 


map and convert the relational data 


model to graph model without 
semantic loss 

modeling tasks are considerably 
more natural compared to RDBM; to 
reproduce SDN application 
primitives 


improve the query performance by 
6.62 times 
Comparative benefits& weakness 


outperforms 
source 


widely used open- 


discovery of new information and 
validating the existing data 


efficiency, effectiveness and 
clearness; two different 
representation of Word- 

Net 

Reverse engineering from iOS 


platform to Android platform 
Approach for efficient and scalable 
query processing over RDF graphs 
distributed over a local data grid. 
outperforms ontology stores and 
match solutions relying on relational 
databases 


Got large amount of open source 
documents 
Achieved 
efficiency 
enhanced learner centered online 
learning experience 


higher speed and 


recognition rates were improved, 
especially for the smiling and 
screaming faces whose line-edge 
maps are greatly distorted 

real-life querying 


2.14. Social Networking 
The recent ideas towards the graph data generated by social networks are presented in the following 


Table 6. 
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Table 6. Work for social networking graph database 


Author Issue considered Method adopted Result 

Dayarathna and Graph Database for Hybrid Clouds Distributed Graph Database Achieves faster performance 

Suzumura [42] 

Soussi [43] Social Network Extraction Graph Database Achieves better extraction 

Yar and Tun [44] Searching Personnel Relationship Graph Database relationship among persons can 
simply and accurately be inferred 

Mir and Wright [45] Differentially Private Estimator Kronecker graph model Generate synthetic graphs that are 


“similar” to the original target 
graphs in a privacy preserving 
manner. 
Shrivastava and Pal Graph Mining Graph data processing, Achieved quality in processing 
[46] extraction and visualization 


2 RESEARCH GAP IN THE EXISTING WORK 

The research work that performed in recent years is lacking with effectiveness in a huge 
unstructured graph database. The work of the various authors lacks the research efficiency in their methods. 
The applicability of these methods in real time applications shows poor performance. There is a need for 
proper research in a huge an unstructured graph database management. The following are the listed research 
gaps in the existing work: 

a. The existing researches are doesn’t support the huge unstructured graph data induces complexity 
of technique and some of the performance metrics are need to be explored in graph data. 

b. The existing researches doesn’t fulfill the challenges of the hosting graph database like dynamic 
nature of graph database volume, tough to maintain the graph data and consumes higher 
computational time for evaluation of graph queries. 

c. The graph has partitioning issue, information loss and unnecessary computation are still 
unaddressed and are need to address for unstructured graph database. 

d. The researches which are mentioned in past are not designed for data mining of graph database. 

e. The existing privacy preservation techniques using data anonymity approaches are not efficient 
as this approach doesn’t provide theoretical evidence that the provided solution is effective 
against the security issues. 


3 LINE OF RESEARCH IN FUTURE 
The better graph database scalability and management of huge unstructured graph database can be attained as 
below steps. 
a. Outlining the existing issues in current work. 
b. A prototype is needs to be designed to generate large and unstructured data for real-time 
applications and also implement the respective graph theory to model the graph database. 
c. A novel data mining algorithm is needed to be developed for the graph database having low 
computational complexity. 
d. A cost effective mechanism is needed to be build, and it should offer privacy for huge 
unstructured data. 
e. The effectiveness of the mechanism is needed to tally with the existing work. 


4 CONCLUSION 

This survey paper gives the better idea of required attributes to manage the huge unstructured graph 
database. This paper gives the different types graph database like Neo4j, DEX, and Titan, etc. The survey of 
recent work towards graph data is collected studied and represented in various sections like Data store 
efficiency, Database Indexing Method, Graph Indexing Method, Sub-graph matching method, Semantic and 
social networking graph database. From the above existing research survey, a research gap is defined, and 
flowingly the research ideas for improving the graph database are presented. 
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